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Abstract 

We study a reduced quantum circuit computation paradigm in which the only 
aUowable gates either permute the computational basis states or else apply a 
"global Hadamard operation", i.e. apply a Hadamard operation to every qubit 
simultaneously. In this model, we discuss complexity bounds (lower-bounding the 
number of global Hadamard operations) for common quantum algorithms : we 
illustrate upper bounds for Shor's Algorithm, and prove lower bounds for Grover's 
Algorithm. We also use our formalism to display a gate that is neither quantum- 
universal nor classically simulable, on the assumption that Integer Factoring is not 
in BPP. 



1 Introduction 

A Quantum Circuit (or Quantum Logic Network) is usually presented as 
being composed both of wires that carry qubits and gates that tap those 
wires to modify the qubits they carry, [S]. In section [21 we specify the 
notation used to describe computation with quantum circuits, and specify 
exactly which features we shall be allowing within the kinds of circuits we 
wish to consider. 

The main focus is to enquire about the difference it makes if we limit 
to using 'classical' gates (ones which preserve the set of computational ba- 
sis states) and 'global Hadamard transforms' (where a Hadamard gate is 
applied once to each qubit.) We will show in section |21 that our imposed 
limitations do not limit computational power in any real sense, and in subse- 
quent sections will discuss what algorithms and algorithm-primitives tend to 
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look like within this model. This model is closely related to the complexity 
class Fourier Hierarchy, defined at |^. The motivation comes not from 
physical considerations pertinent to the task of fabricating a quantum in- 
formation processor, but from the desire to analyse a fairly natural-looking 
measure of circuit complexity that is not apparent within the standard 
model, viz the number of these global Hadamard transformations needed. 
The reason for requiring that the Hadamard operations be applied on every 
qubit is that we're not necessarily thinking of actively applying them, so 
much as just passively changing what is meant by the computational basis, 
and therefore changing the interpretation of future gates or measurements. 
More will be said on this in Section f4. 41 

Section 0] will discuss the idea of replacing the Fourier Transform in 
Shor's Algorithm with the kind of transform that is simplest within our 
limited model, and look at how this affects the gate-complexity of the algo- 
rithm. The concept of Order Finding, on which Shor's Algorithm is based, 
turns out to be a very natural primitive within the present context. 

In Section [51 we extend a result of _T to show the trade-offs in different 
complexity measures relevant to Grover's Algorithm. This is undertaken 
in a similar spirit to the work of showing why Grover's algorithm is 
essentially non-parallelisable. 



2 Notation and Terminology 

A circuit incorporates a finite number of wires, which run right through 
the circuit, not terminating prematurely {i.e. no 'adaptive' measurements.) 
The width of a circuit is taken to be the number of such qubit wires used. 

The input to the circuit is taken to be a (classical description of the) state 
in which the wires are initialised, usually a simple computational basis state. 

Each wire, i.e. qubit, codes quantum data with respect to a tensor 
component within the full space C^" that is associated with the totality 
of n qubits, and these component spaces are given a computational basis 
(Z) and a Hadamard basis {X), both of which are orthonormal : 



Z-basis := s I ), I 1 ' 



r I , \ ^|o\ .= io)+ii) ) 

X-basis := <^ ' ' ■ ' ' ■ |n\^n' }■ (1) 
I I-) := H\l) := f 

The data resident on a circuit may equally well be described by a trace-1 
Hermitian density operator, which will be of rank 1 if the state is pure. We 
will use an algebra of Pauli symbols 

-'in. -4;;). --iu)^ -(;-V 



(2) 
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to describe such density operators explicitly, where necessary. Subscripts 
on these symbols will distinguish between different qubits. The same Pauli 
symbols may be used to describe unitary transforms (conjugations of the 
density operator by unitary actions,) so for example the C-Not gate may be 
written 

A,{Xt) = \{l + Z, + Xt-Z,-Xt), (3) 

where c and t label the control and target qubits respectively. More gener- 
ally, the generalised Toffoli gate may be written 

ac(xt) = 1 - f 1 - n n C-^) ■ 

In classical terms, these gates flip each of the target bits if each of the 
control bits is set to 1, and otherwise acts as the identity. The two sets of 
qubits C and T must, of course, be disjoint, and T must not be empty. (To 
see that Q is unitary, observe that it is the identity minus twice a product 
of commuting projectors, and hence geometrically a reflection.) We call the 
number of such gates used the size of the circuit, (other authors give similar 
- though different - definitions of size, but this point will not be important 
in this discussion.) 

As well as allowing for Generalised Toffoli gates (to be used an arbitrary 
number of times on arbitrary qubits within a circuit,) we also allow for a 
Hadamard map to be applied simultaneously to every qubit in the circuit : 




where the product is taken over every index, without exception. We use 
the expression quantum-depth to count the number of such operations used 
within a circuit. 

For the purposes of obtaining computational outcomes, projective single- 
qubit measurements in the computational basis may be applied to the quan- 
tum output of a circuit. We will use the term output to refer to the sublist of 
the measured qubits that carry the data in which we're interested, since in 
many circuit designs it happens that many wires don't output useful data. 

To avoid unnecessary discussion at the bit-level, we will use the term 
register to denote a list of wires, and output register to denote those wires 
that carry useful data at the end of the circuit. If a circuit has been designed 
to be useful with different inputs, then the term input register will be used 
to refer to those wires that are initialised differently on different runs of the 
circuit, and the term ancilla register will refer to those wires that are always 
initialised in the same way for each run. If a circuit is not designed to be 
used with different inputs, then the term ancilla register is used instead to 
refer to the qubits that aren't in the output register. 
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3 Universality 



It isn't too hard to see that there are constant-width and -depth circuits 
which can be used to implement local Hadamard transforms within this 
model, with certain ancilte used as catalysts. For example, the circuit 

AM) ■ ■ Kd{Xa) ■ ■ A,d{Xb) (6) 

clearly has effects which are local to qubits a, b, c, d. In particular, it maps 
states as follows : 



I 1-00) ^ |i-++) 

1 1-01) ^ |i--+) 

|i-io) ^ |i-+-) 

11-11) ^ |1 )• (7) 

Therefore, we could take qubits a,b,c to be an ancilla in the state | 1— ?), 
(the question-mark denoting a totally arbitrary qubit,) and use the circuit 
above, followed or preceded by a swap on qubits c and d, to achieve a 
local Hadamard transform on qubit d while returning the other qubits to 
their former states. (Strictly speaking, the data on qubit c is not returned 
to its former state unless that state were fully mixed. But if we never 
'care' about that qubit, then no problem arises.) Notice also that the swap 
gate can be constructed from three C-Not gates, which are amongst the 
Generalised Toffolis. (It is interesting that an ancilla was necessary for this 
construction.) 

We should point out that although Toffoli gates and local Hadamard 
gates are insufficient for achieving all unitary transforms directly, they are 
well known to constitute a Universal Set of computing gates, and so 
the limitations that we have imposed should not be thought of as especially 
restrictive. Yet one does indeed seem to end up with some limitations on 
computing power if it is not possible to initialise ancilla qubits with both 
X- and Z-basis elements. 



4 Circuits with Small Quantum-Depth 

4.1 A Modification to Shor's Algorithm 

In this section, we will show how Shor's Algorithm may, for all practical 
purposes, be divided into two parts; the first of which performs some order- 
finding operations, the second of which converts 'order-finding data' into 
an actual problem answer, (and is called postprocessing.) We will show 
that each of these parts on their own can be performed efficiently without 
using any Hadamard maps, though Hadamard maps are necessary within 
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the interface. (While the division of Shor's Algorithm into two parts is 
not in itself a novel concept, it is hoped that a detailed analysis of its 
implementation within this particular model has something useful to teach 
about its algorithmic complexity.) 

4.2 Order Finding 

For Order- Finding, which is the first part of our algorithm, let there be an 
ancilla register whose contents will code for elements of some presentation 
of a cyclic group G = (g), when in the computational Z-basis, and which 
more generally codes for superpositions of group elements. Let there be an 
input register that starts out with each qubit in the pure state | + ) . 

Let U be the map on the ancilla register that carries \x) to \x-g), which 
(by hypothesis, and by universality of Toffoli gates for classical computa- 
tion) can be constructed from a reasonable number of (generalised) Toffoli 
gates. We shouldn't generally care how U maps vectors that aren't in the 
span of the coding for G, (which necessarily exist whenever |G| is not a 
power of 2,) so to keep matters simple we shall require that U acts as the 
identity on computational basis vectors that don't code for elements of G. 
Again, it is straightforward to find efficient Toffoli circuitry for doing this 
by simply taking classical circuitry for the same problem and making it 
reversible. 

A. Kitaev made the observation in that simple circuitry will assist 
in the estimation of one of the eigenvalues of such a map, U. Ignoring the 
eigenspace of eigenvalue 1, the remaining eigenspaces of the domain of U 
will each be 1-dimensional, and of the form 

|G| 

I A.) := \G\~'/' ^u;-^\g'), (8) 
j=i 

for uj a complex |G|th root of unity. 

Then, if the ancilla register were to hold such an eigenvector, we could 
apply the gate Ar(C/anc) - a gate likewise constructible from generalised 
Toffoli maps - which implements U on the ancilla controlled on the Z-basis 
setting of an input-register qubit r; and thus effect the transformation 

, . |0)^ + |1)^ \0)r+Uj\l)r (I+UJ). . , (I-W), > 

'+^^= V2 ^ — 71 — = ^"+>^ + ^"->- 

(9) 

The classical bit issued from the measurement (in the Hadamard basis) 
of the register wire would therefore be biased as [cos^(0) : sin^(0)], where 
uj =: exp(2i0). If we would rather not make a Hadamard-basis measurement 
for classical postprocessing but would instead process the data within the 
quantum circuit, then we may of course simply transfer the data back into 
the computational basis with a H°° operation, instead of measuring it. 
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The aim in any rendition of Shor's Algorithm is not to consider such 
data in isolation but to chain together several of these Ar(i7anc) gates, using 
a different register wire r for each. Note that these gates all commute, so 
there is no need to prespecify the order in which they are to be performed. 
In this manner, many bits are provided for assisting in the estimation of 9. 

As it stands, this algorithm doesn't amount to much, because the classi- 
cal bits issTicd by such a circuit are highly correlated and carry little infor- 
mation collectively. Indeed, one would obtain a remarkably poor estimate 
of 6 if this approach alone were relied upon. (It would require exponentially 
many samples of the [cos^(0) : sin^(0)] probability distribution to estimate 
LO to a sufficient accuracy.) The real key to Kitaev's observation is to use 
other unitary maps that have the same eigenspaces. These are the maps of 
the form Ar{U^^^), and they all commute. By chaining these maps into the 
computation, data can be collected not just on some specific u eigenvalue, 
but on its powers, uj'^, also. This process can be extended to include as 
many commuting maps as we have patience and spare wires for, provided 
that it is no more difficult to find circuitry for implementing Ar{U^^^). 



4.3 Ancilla in Order-Finding 

In practice, it will not be possible to load the ancilla register with a non- 
trivial eigenvector. Instead, we load it with an arbitrary state. Provided 
that this initial ancilla state does not overlap much with the eigenvalue- 
1 eigenspace, the effect of the Order-Finding part of the algorithm just 
described will be to generate within the input-register data that is in a 
superposition of states which (when understood in the Hadamard basis) 
estimate the different possible 9 values. To a great extent, it makes no 
difference whether the ancilla superposition is coherent or incoherent, so 
either an arbitrary pure state or a mixed state could be used. Even a fully- 
mixed state can be used for the ancilla, since the eigenvalue-1 eigenspace 
doesn't dominate the 'useful' spaces superpolynomially. 

To make this more explicit, we show how to determine a stochastically 
chosen 9 in the case where it is easy to find simple circuits for exponentially 
high powers of a certain U. Starting with a ■ b input wires, each initialised 
as before to | -|- ); for j G {1, 2, . . . , a}, for Cj being an exponentially growing 
series of positive integers, we can reserve b wires for each value of j, each 
for controlling one implementation of V^^ . We are free to choose any values 
we like for the cj, but not all choices will be appropriate for completing the 
task of eigenvalue estimation. More will be said about the cj values later. 

" E « (g)((i±^2m±(i^^)^'. (10, 

OJ 7 = 1 ^ 
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The traditional rendering of Shor's algorithm proceeds by applying an 
appropriate Quantum Fourier Transform (QFT) to the input register at 
this stage [2], but since the QFT is not readily implementable using small 
numbers of the gates that we are wishing to consider, we shall instead 
consider this output state (|in|) to conclude the Order-Finding part of our 
algorithm. 

4.4 Quantum Depth 

Sometimes it is convenient to think of the full algorithm as having quantum- 
depth of 1, allowing | + ) states as input and where the two parts (order- 
finding and post-processing) are separated by a H°° operator. By perform- 
ing a H°° on the output state of it is clear that the data in which we're 
interested are transformed from the X-basis to the Z-basis, where they can 
be processed with Toffoli gates. 

Sometimes it is better to think of our algorithm as having quantum- 
depth 0, where the postprocessing part is performed not inside the circuit 
but on a classical computer after the quantum algorithm has ended and 
the circuit output has been measured in the X-basis (assuming X-basis 
measurement is available.) 

Sometimes it makes more sense to think of our algorithm as having 
quantum-depth 2, for then we are free to use the canonical | ) input on 
each wire, and require no measurement in the X-basis at all. 

Therefore, rather than trying to answer definitively the question, "What 
is the quantum-depth of our implementation of Shor's Algorithm?" we shall 
instead proceed simply by describing how classical processing of samples 
from 

{ Xj ~ Binomial{ b, sin^(0 • cj) ) } (H) 

can be used to provide a good enough estimate of lo, where to is chosen at 
random with probability |ai^p (see expression (fTO|) .) and u! =: ex.p{2i9). [To 
be clear, xj will be an integer between and b, most likely around about 
b-sm^{e-Cj), kc...] 

4.5 Parallelisation of Order Finding 

Before proceeding to explain the (classical) computational tasks involved in 
processing the output of Order Finding, we make a few comments about 
the circuitry used so far. Toffoli circuitry is used (as explained in subsection 
14. 2() to implement 

Yl^R^iU^Lc), (12) 

j 

where each Rj is a set of b wires used to control the application of the gate 
, which is just a modular-multiplication mapping. It is well known {e.g. 



7 



see for the case of integer multiplication) that circuitry for this kind of 
functionality can frequently be constructed to have overall polylog depth in 
the input length, with a merely polynomial overhead in the circuit width, 
paid for by addition of certain extra ancilla gates that are to be initialised to 
I ) . To provide such ancilte within our model would add a small constant 
factor to the overall quantum-depth of the algorithm, as well as increasing 
its width; but if we are prepared to pay such a cost, then the circuitry of 
Order Finding can be said to be 'exponentially parallelisable'. 

4.6 Eigenvalue Estimation 

Since having rejected the Quantum Fourier Transform as a way of recover- 
ing the desired (classical-description) approximation to co, we must instead 
address the purely classical problem of figuring out how to choose the pa- 
rameters a, b, and Cj, and how to use the samples xj in order to form 
the approximation to uj in an efficient manner. This is not an altogether 
straightforward task, because it depends on the problem being solved, e.g. 
one chooses parameters slightly differently from the Integer Factorisation 
problem than for the Discrete Logarithm problem. 

A little experimentation in the general case, implementing a sampling 
strategy for random 9 in expression on a (classical!) computer, has led 
us to the conclusion that the optimal choice for cj is not consistently 2^ as 
used with the QFT method, (see 0-) This is because in the case where the 
binary expansion of 6 has a long run of zeroes or ones after j places, then 
there is remarkably little data in Xj. 

Experimentally, we found that if 6 needs to be estimated to within one 
part in K, then it suffices (for a reasonable chance of estimating 6 correctly) 
to take b = 31oglog(-fr) and to let the cj take every value between 1 and K 
that either has just one bit set or else has two bits set which are not more 
than 5/2 places apart. Using those cj values with two bits set enables one 
to write code which can successfully 'bridge over' those region of the Real 
line (for 9) that include numbers whose binary expansions have early long 
runs of zeroes or ones. The precise details of this technique are omitted for 
brevity, but the problem is not technically difficult. Proving rigorous limits 
in this regime however remains an open problem. 

In the case where 9 is known a priori to be a multiple of 2Tr/(p for 
some unknown integer (p ~ ^/K, then this resolution suffices (with constant 
probability) to recover 9 exactly, by the method of continued fractions. 
There is extensive literature on applying Eigenvalue Estimation to specific 
problem- instances . 
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4.7 Generalising "Quantum- Depth = 2" 

We saw in the prequel that (a version of) Shor's Algorithm may be im- 
plemented in our computational model with overall quantum-depth of 2, 
starting from a trivial Z-basis state and ending with a Z-basis measurement. 
Consider now a 'general' algorithm of quantum-depth 2, and the computa- 
tional path it follows. Starting from Z-basis state | a ) (where a 7^ 0) on 
n wires; apply a H°° map; then a Z-basis-permutation T; then another 
H°^; then measure the result |c) in the Z-basis. This computation, and 
the resulting probability distribution, may be denoted by 

I a) ^ 2-"^(-l)<»''')(-l)<^(^)''=)|c) 

b,c 

^ { Prob(c) = 2^"^(-l)<"''')(-l)<^(^)'^> } 

b 

( Prob(O) = 
- \ Prob(c) = 2I-" . EMa,6)=o(-l)<^^'^'^^ 

It is somewhat surprising to reflect that this little formula contains vir- 
tually all the power needed to factorise large integers. Note that as well 
as being implementable in the model this paper has described, algorithms 
with this low quantum-depth are also effectively implementable (modulo 
classical postprocessing) on a quantum computer that has access only to 
gates of the form ■ Ac{Xt) ■ , together with Z-basis inputs and 

outputs. [Note that despite our notation, this gate acts trivially on all but 
a constant number of qubits, and so it is 'small' in the usual sense.] While 
such a machine could help us factorise large integers, it would be of little 
use for ordinary 'classically easy' tasks. The example of this "conjugated 
Toffoli gate" essentially resolves one of the open problems (problem 4) listed 
in |2] : The gate is neither classically simulable nor universal for quantum 
computation in any reasonable sense. This assertion is justified by the ob- 
servations : 1) if it were classically simulable, then Order Finding would be 
classically simulable too, and hence Factorisation would be in BPP; 2) if 
it were universal for BQP then a polynomial usage of the conjugate- Toffoli 
could approximate one usage of an ordinary Toffoli gate, so by symmetry 
a polynomial usage of the Toffoli gate could approximate one conjugate- 
Toffoli gate, and hence the latter would be classically simulable. To find 
uses for such a "conjugate- Toffoli machine", beyond applications of Shor's 
Algorithm, we leave as an open problem. 



(13) 
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5 Grover's Algorithm 



5.1 Simple Oracle for the Unstructured Search Problem 

Define a simple Grover Oracle Gx, with hidden data x, to be an application 
of the following one-register phase- flip map. (Note that the Grover Oracle 
does not return a description of a circuit for implementing that mapping, 
rather it simply returns an application of that mapping, as per the 'black- 
box' model.) 

G,\z) := (-l){"=^("0|z). (14) 

Here / is taken to be a fixed function acting on the labels of the compu- 
tational basis of the register, and | x = f{z) } is 1 if x = f{z) and zero 
otherwise. We let denote the cardinality of the range of /. It is intended 
that / have very little complexity : typically it will just check the first 
log(A^) qubits of the register. 

A simple Grover Oracle for an unstructured search problem may be sim- 
ulated in the model pertinent to our present study, provided that there is 
an efficient way of determining classically whether x = f{z) for any given 
z. For example, this can be managed with a two-qubit ancilla in the state 
I 1— ), so that no matter how many times H°° has been applied, there will 
still be a qubit in the state | — ) on which to target a generalised Toffoli 
gate, so as to implement the necessary sign-flip that the oracle (simulation) 
requires. 

5.2 Search Algorithm Notation 

A Grover Search Algorithm expressed in our model comprises a circuit of 
Aci^x) gates and gates and an input, and a computational basis 
measurement at the output which, with high probability, will result in a 
\z) that indicates the hidden data x = f{z). We are interested in lower- 
bounding not the depth of the circuit but rather its quantum-depth and the 
number of oracle calls used. The notation for the algorithm is 

G^^ • . . . • . ^oo . ^fci . I ^ 

where 

G^* := Tt^kt ■ Gx - ■■ Tt^2 ■ Gx ■ Tt^i ■ G^ ■ Tt^, (15) 

where Ttj are predetermined permutations of the Z basis. The quantum- 
depth of this circuit is T — 1 and the number of oracle calls is Ylt=i ^t- 

It would be trivial to specify how to implement Grover's Algorithm if 
we didn't care about quantum-depth; simply interleave calls to the oracle 
with maps of the form H°° ■ Tq ■ H°°, for some appropriate Tq. But the 
questions we should like to ask concern lower-bounds and trade-offs between 
the quantum-depth and the number of oracle calls required. 
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5.3 Categories of Success 

To be precise, we must state for which kinds of algorithm are we asking 
for bounds. Consider deterministic algorithms, algorithms that have worst- 
case probability above some bound, and algorithms that have average-case 
probability above some bound. 

Deterministic algorithms are not the focus of this study. While of the- 
oretic interest, such algorithms are less amenable to the kinds of analysis 
that we wish to deploy, and the model described in this article probably has 
little to contribute to their study. 

Algorithms with bounded worst-case probability are those for which, no 
matter what the hidden data turns out to be, the probability of the al- 
gorithm succeeding exceeds 1 — e for some constant e. Since the success 
probability of an algorithm can always be 'amplified' by running it several 
times (either in series or in parallel), without loss of generality we always 
take e to be less than 1/2. This kind of algorithm is probably the easiest 
to bound, (see ^ for a good general technique for non-geometric proofs in 
this area,) but they shall not be of primary concern to us here. 

Algorithms with bounded average-case probability are the most relevant 
for an algorithm designer, since they allow for the possibility of breaking 
some symmetry in the problem in such a way that certain answers will be 
substantially easier to recover. They are algorithms for which a priori the 
success probability exceeds 1 — e. 

For highly symmetric problems (such as Unstructured Search) one gen- 
erally expects such algorithms to be the same as those with bounded worst- 
case probability, but the techniques for proving average-case bounds can 
sometimes be different. Given many instances of a certain kind of 'natu- 
rally arising' problem, an algorithm with bounded average-case probability 
will be most welcome, since it will almost certainly serve to address many 
of those instances. It is less use if the instances in hand are actually derived 
from a different kind of problem that just happens to lie 'nearby' in terms of 
complexity, since there might be no reason to assume that what is 'average' 
for one class of problem will translate into average instances as derived from 
that class. 

Henceforth we will consider only lower bounds for the resource require- 
ments of algorithms with bounded average-case probability, since a lower 
bound for resource requirements in the average case implies a lower bound 
for resource requirements in the worst case, but the converse does not follow. 

5.4 Lower bounds 

In this section we give some preliminary lemmas leading to a proof of the 
following : 
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Theorem 1 Any Grover Search Algorithm, as defined in section \5.^ with 
the notation of equation M5)) . with some constant lower hound for its average- 
case expected probability of success, will require sufficient oracle calls between 
H°° stages so that 

T 

Y^y^^ = n{VN) (16) 

t=l 

holds true asymptotically. 

Note that this means an algorithm might make one oracle call in each of 
~ y/N quantum-depth-phases, or it might make ~ calls without any 
quantum-depth, or it might interpolate these extremes. 

Lemma 1 Let Wx t be entries of a 2-dimensional array of non-negative real 



numbers. Let Rt = y X^^. w"^ t 2-norm of the rows, and let Cx = 

Ylt 1-norm of the columns. Then let R = Rt be the 1-norm 

of the Rs, and let C = ^J'Ylix 2-norm of the Cs. Then R> C. 

To prove this, let t be the vector whose xth entry is Wx^t- Then using the 
Euclidean inner product, we can rewrite Rt = \l {t, t). By expanding the 

expression for C we get = Est Likewise, R^ = [J2t \AM> 

J2s t V i^' ^) ' S° suffices to show that always 



V{s,s)-{t,t) > {s,t), (17) 
but this is a basic (Euclidean) trigonometric inequality. ■ 

Lemma 2 Given three unit vectors in a Euclidean Geometry, the sum of 
the absolute values of the sines of any two inter-angles is no less than the 
absolute value of the sine of the third inter-angle. 

This is trivial in two dimensions. The general problem is three-dimensional. 
Consider three unit vectors whose components are (1,0,0), (a, 6,0), and 
(c, d, e), with \b\ > 0. (No generality is lost with this choice.) Then it only 
remains to show that 



6 + \/l - c2 > - (ac + bd)^. (18) 
We begin by squaring both sides, so that we are required to show that 



b'^ + l-c^ + 2h\J\ - c2 > \-{ac^bdf. (19) 

Next, we try to balance between d and e. The case d = immediately 
satisfies ((TU]) . The case e = reduces (fT^ to the requirement \ac^bd\ < 1, 
which is a basic trigonometric result, as the reader can easily check. The 
only remaining case occurs when the partial derivative of H19() w.r.t. d 
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vanishes, i.e. when (ac + bd)b = 0. The 6 = sub-case is easily dismissed, 
and the other sub-case is defined by ac = —hd. We can use this to reduce 
(|19j) to 6^ — + 26\/ 1 — > 0, which by solving for h is identical with 
the condition 6 > 1 — \/l — on the range of validity. We can also use it 
to obtain (1 — b'^)(? = b^d^, and hence |c| < b. And this latter inequality 
guarantees the condition soug ht : > ^fY^ > 1 - 5. ■ 

Lemma 3 For (tx) a list of real numbers, 



N 



N 



< Np ^ Y.*x < N^. 

x=l x=l 

The variance of is non-negative, that is 

2 

I > 0, 



(20) 



(21) 



and the lemma follows directly from this observation. ■ 

To prove theorem^ we proceed as follows. Referring back to the nota- 
tion of section EIS let 



Ut 

Px,t 

I V'x,«,t ) 

and write 
path^,i(2;) : = 



• J^t,kt ■ ■ ■ J'tfl, 

UfUt-i...Ui-\i;); 
Ul ■ ■ Gl! ; 



{f-Ttfl{z) = x]® 

|2 



Ut+1 ■ p. 



x,t+i ■ I i't 



(22) 



{ / • Tt^kt-i ■ --Ttfiiz) = x]; 

(23) 



^|(<A|z)npath,,,(z)^l} (<1); 

z 

Then it follows that 

Px,t\z) = (-1)P^*V.W|^); (24) 
and states defined in (|22|) may be compared with each other according to 

( 1px,T,t I 1px,T,t~l ) = ( ^/'t-l I • Px,t ■\lpt-l) 

l-\{^x,T,t\^x,T,t-l)\^ < iWx,ti\i^t-i))- (25) 

Square-rooting expression (|25|) . summing that over the t values, and further 
approximating by many applications of lemmaEl then squaring, we obtain 

1-|(V't|Vx,t,o)|' < ^JWxAli^t-i))^ ■ (26) 
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Using the trigonometric notation 

C0S(| Vt), I ^x,T,o)) = I ( V'T I V'x,T,0 > I (27) 

and summing expression (|2(i|) over the x values, we obtain, 

2 

5]sin2(|V^^),|V^.,^,o)) < 4E(EV^-.*(I^*-i))) • (28) 

X X \t=l / 

Denote by F the left side of this inequality. Apply lemma ^ to the right 
side of this inequality after square-rooting, and it yields 



t=i 



Now, by equation (|2.S|) . we know that 

= 5^|(^i_i|z)|'^{path,,,(z)^l} 

X Z X 

< J2\{^Pt-l\z)\'.kt 

z 

= h. (30) 
So if we put these observations together (substitute (|5n|) into (HU), we see 



F < iiy^Vh] ■ (31) 



It suffices to show that this F is lower-bounded by Q{N) whenever the al- 
gorithm has expected success-probability Q{1). Let | Cx ) denote the closest 
unit vector to | ipx,Tfl ) that is in the span of {\ z) : f{z) = x}. Since the 
output of the algorithm is H°° ■ \ '<px,T,o ) , we can use a measurement in the 
basis given by H°°\ Cx ), and by the Born rule establish that the expected 
probability of success is 

p = Ar-i^cos2(|^,,r,o),|C,.)), (32) 

X 

which we rewrite as 

Y,sm\\tPx,T,o),\Cx)) = N{l-p). (33) 

X 

It follows from lemma |31 that 

^|sin(|V'.,r,o),|C,))| < N^/l^. (34) 
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And since the | Cx ) are all mutually orthogonal, Ylx cos^(| 'iPt)i\Cx)) < 1, 
which we write as 

Y,sm\\i/jT),\Cx)) > N-1. (35) 

X 

Now apply lemma 01 directly to the definition of F (LHS of inequality (|28() ) 
to obtain 

^|sin(|^T),|V'x,T,o))| < VN-F. (36) 

X 

Putting together lines (jHUl , , , and applying lemma El we obtain 

VN ■ F + N^yi-p > N-1, (37) 
whence p = =^ F = ^{N), as required. ■ 

6 Conclusions 

We have introduced a new theoretical model for quantum circuits, designed 
to highlight one aspect of the way in which quantum computation differs 
from classical computation. We have thence illustrated a little of what can 
be achieved within limited quantum-depth, by analysis of the two main 
(or well-known) algorithms of Quantum Information Processing; showing 
that 'hard' classical problems can sometimes be 'solved quantumly' using 
only Toffoli gates, and using the model to add to the growing literature 
on "algorithmic trade-offs". We have developed a few tools for facilitating 
these analyses, and exemplified quantum a gate that is probably neither 
BQP universal nor classically simulable. Further exploration of the power 
of computing within very small {e.g. constant) quantum depth remains an 
interesting issue for future research. 
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